# REPLICATION ARCHIVE

This replication archive contains all scripts and data necessary to replicate the analysis in "No Longer Conforming to Stereotypes? Gender, Political Style, and Parliamentary Debate in the UK".

## MANUSCRIPT AND SUPPLEMENTAL INFORMATION

The manuscript and supplemental materials are generated from RMarkdown documents, with a style "header" .tex file for translation to LaTeX and a .bib bibliography archive.  These can be "knit" in RStudio, but only after first running the full sequence of R scripts listed below. 

- manuscript_final.Rmd - manuscript RMarkdown file
- supplementary_materials_final.Rmd - supplemental information RMarkdown file
- header.tex - manuscript RMarkdown pdf template
- test.bib - bibliography file

## SCRIPTS

In order to re-run the models described in the paper, you will need to execute the following files (in this order). You can execute the full sequence of scripts by running the "00_master.R" script.

- 01_prep_fcm.R

	- Constructs the feature-cooccurance matrix required for estimating word embeddings
- 02_estimate_embeddings.R

	- Estimates word embeddings from the parliamentary corpus

- 03_prep_dictionaries.R

	- Constructs word-scores for each word in the corpus using the estimated embeddings and a set of seed dictionaries
- 04_apply_dictionaries.R

	- Scores each speech in the corpus using the word-scores
- 05_apply_repetition.R

	- Calculates the repetition of each speech in the corpus using a compression approach
- 06_apply_complexity.R

	- Calculates the complexity of each speech in the corpus using the Flesch-Kincaid measure
- 07_prep_meta_data.R

	- Adds a series of meta data variables to the style scores for each speech
- 08_stan.R

	- Estimates the hierarchical models in the paper
- 09_stan_analysis.R

	- Creates various outputs from the estimated hierarchical models
- 10_topic_model_prep.R

	- Estimates a series of topic models for the corpus
- 11_topic_model_analysis.R

	- Creates outputs for the topic model analysis
- 12_diagnostics.R

	- Various ancillary analyses

In addition to the data resources described below, these files also require the Stan code for each of the models. These are contained in the following folder:

- stancode/	- model2_control.R	- model2.R

The total run time for all models, which includes many robustness checks and alternative specifications described in the supplementary materials, will be about 6 days on a modern computer with SSD storage and at least 16GB memory, so plan accordingly.  

## DATA

- debates.Rdata # Raw debate texts and associated meta data from https://reshare.ukdataservice.ac.uk/854292/
- validation_data.Rdata # Validation data from pairwise comparisons of debate sentences
- dictionaries/
	- LH_aggression_seed.csv # Bespoke aggression seed dictionary	- liwc.Rdata # LIWC dictionaries	- parliamentary_jargon.csv # Dictionary of parliamentary jargon	- seed_words_anecdote.csv # Bespoke human narrative dictionary
- mp_data/
	- final_occupation_education_v2.csv # Data on MP occupation and education levels


## MISC

- app.jpg - Screenshot of the app used for the human validation app

## SOFTWARE

- R v4.0.2  
- RStudio v1.3.1056 
quanteda #  v3.0.0
quanteda.dictionaries # [github::kbenoit/quanteda.dictionaries] v0.22
tidyverse #  1.3.0
margins # v.0.3.26
stm # v.1.3.6
data.table # v.1.13.6
scales # v1.1.1
sandwich # v.2.5.1
ggplot2 #  v.3.3.3
rstan #  v. 2.21.2
bayesplot #  v.1.7.2
plyr # v1.8.6
texreg # v1.37.5
plm # v2.2-3
corrplot # v0.84
quanteda.textstats # v0.94
text2vec #  v.0.6
ukbabynames # v0.1.1